37 research outputs found

    The Influence of layout on the interpretation of referring expressions

    Get PDF
    From the introduction: The division of text into visual segments such as sentences, paragraphs and sections achieves many functions, such as easing navigation, achieving pragmatic effect, improving readability and reflecting the organisation of information (Wright, 1983; Schriver 1997). In this paper, we report a small experiment that investigates the effect of different layout configurations on the interpretation of the antecedent of anaphoric referring expressions. Layout has so far played little role in Natural Language Generation (NLG) systems. The layout of output texts is generally very simple. At worst, it consists of only a single paragraph consisting of a few sentences; at best it is predetermined by schemas (Coch, 1996; Porter and Lester, 1997) or discourse plans (Milosavljevic, 1999). However, recent work by Power (2000) and Bouayad et al. (2000) has integrated graphically signalled segments (e.g., by whitespace, punctuation, font and face alternation) such as paragraphs, lists, text-sentences and text-clauses in a hierarchical tree-like representation called the document structure.2 This work was carried out within the ICONOCLAST project (Integrating CONstraints On Layout and Style), which aims at automatically generating formatted texts in which the formatting decisions affect the wording and vice-versa.3 If document structure affects the comprehensibility of referring expressions, this must be taken into account in any attempt to generate felicitous formatted texts. This will go a step further from current research in the automatic generation of referring expressions, where only the effect of discourse structure and grammatical function has been investigated (Dale and Reiter, 1995; Cristea et al., 1998;Walker et al., 1998; Kibble and Power, 1999)

    Discourse structuring of dynamic content

    Get PDF
    Uno de los desaf铆os de la Generaci贸n de Lenguaje Natural es la adaptaci贸n de la estructura y las palabras de la salida ling眉铆stica a la habilidad del usuario, el contenido, el g茅nero apropiado, el estilo, etc. Nos centramos en la determinaci贸n de la estructura del discurso. En general, se supone que entre dos unidades de contenido ocurre siempre la misma relaci贸n de discurso. Propuestas que var铆an el tipo de relaci贸n discursiva y el orden de las proposiciones seg煤n la interpretaci贸n del contenido siguen siendo escasas. Sin embargo, tal interpretaci贸n es extremadamente importante especialmente si el contenido es altamente din谩mico como por ejemplo, cuando los datos son series temporales. Presentamos un planificador de textos que considera las restricciones que imponen los datos din谩micos para tomar decisiones a cada etapa de la planificaci贸n, en particular para la selecci贸n de las relaciones discursivas y la ordenaci贸n de las proposiciones.One of Natural Language Generation鈥檚 continuing challenges is to determine the structure and words of the generated linguistic output in accordance with the expertise of the user, the content, the appropriate genre, style, etc. We focus on the determination of the discourse structure. Most often, it is assumed that between two content units always the same discourse relation holds. Approaches in which the choice of discourse relations and the ordering of propositions depends on the interpretation of the content are still scarce. However, such an interpretation is extremely important especially if the content is highly dynamic as, e.g., in the case of data parameter time series. We present a text planner that takes into account the constraints imposed by dynamic data to make decisions at every stage of the text planning, and in particular, for the selection of discourse relations and the ordering of propositions.The work reported on in this paper has been carried out in the framework of the MARQUIS-project funded by the European Commission in the framework of the eContent programme under the contract number EDC-11258; duration: 2005-2007

    Can text structure be incompatible with rhetorical structure?

    Get PDF
    Scott and Souza (1990) have posed the problem of how a rhetorical structure (in which propositions are linked by rhetorical relations, but not yet arranged in a linear order) can be realized by a text structure (in which propositions are ordered and linked up by appropriate discourse connectives) Almost all work on this problem assumes)implicitly or explicitly, that this mapping is governed by a constraint on compatibility of structure. We show how this constraint can be stated precisely, and present some counterexamples which seem acceptable even though they violate compatibility. The examples are based on a phenomenon we call extraposition, in which complex embedded constituents of a rhetorical structure are extracted and realized separately

    FootbOWL: Using a generic ontology of football competition for planning match summaries

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-21034-1_16Proceedings of 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29-June 2, 2011We present a two-layer OWL ontology-based Knowledge Base (KB) that allows for flexible content selection and discourse structuring in Natural Language text Generation (NLG) and discuss its use for these two tasks. The first layer of the ontology contains an application-independent base ontology. It models the domain and was not designed with NLG in mind. The second layer, which is added on top of the base ontology, models entities and events that can be inferred from the base ontology, including inferable logico-semantic relations between individuals. The nodes in the KB are weighted according to learnt models of content selection, such that a subset of them can be extracted. The extraction is done using templates that also consider semantic relations between the nodes and a simple user profile. The discourse structuring submodule maps the semantic relations to discourse relations and forms discourse units to then arrange them into a coherent discourse graph. The approach is illustrated and evaluated on a KB that models the First Spanish Football League

    Duplication in Corpora

    No full text
    We investigate duplication, a pervasive problem in NLP corpora. We present a method for finding it that uses word frequency list comparisons and experiment with this method on different units of duplication. 1 Introduction Most corpora contain repeated material. In sampled corpora like the Brown Corpus, duplication is not so much of an issue, since the linguistic data is carefully selected proportionally by genre and thus the risk of introducing unwanted duplication is reduced. However, the typical corpus used in NLP is one in which as much data as possible of the desired genre is gathered. The result is a corpus whose nature and content is rather unknown. This issue has not, to our knowledge, been previously discussed in the literature. While we may expect the repeated occurrence of words or expressions to reflect their use in the language, the repetition of longer stretches of printed material (section-, paragraphor even sentence-length) most likely do not. Text processing technolog..

    Layout Annotation in a Corpus of Patient Information Leaflets

    No full text
    We discuss the problems and issues that arised during the development of a procedure for annotating layout in a corpus of Patient Information Leaflets. We show how the genre of the corpus as well as the aim of the annotation influenced the annotation scheme. We also describe the automatic annotation procedure
    corecore